Consistent estimation of a mean pattern in deformable models for 

high-dimensional shape analysis 



Jeremie Bigot and Benjamin Charlier 

Institut de Mathematiques de Toulouse 
Universite de Toulouse et CNRS (UMR 5219) 
31062 Toulouse, Cedex 9, France 

{Jeremie . Bigot , Benj amin . Charlier}@math . univ-toulouse . f r 

October 26, 2011 



Abstract 

We consider the problem of estimating a mean shape from a set of J planar configurations 
described by a sequence of k landmarks. We study the consistency of a smoothed Procrustean 
mean when the observations obey a deformable model including some nuisance parameters such 
as random translations, rotations and scaling. The main contribution of the paper is to analyze 
the influence of the dimension k of the data and of the number J of observed configurations on 
the convergence of the smoothed Procrustean estimator to the mean pattern of the model. Some 
numerical experiments illustrate these results. 
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1 Introduction 

1.1 A deformable model for statistical shape analysis 

Statistical analysis of planar shapes is the study of random (planar) configurations 

(Yi 



r>kx2 



\Y k/ 



described by a set of k landmarks Y £ = (Y} ±J ,Y^>) 6 M. 2 , £ = 1, . . . , k. Since the seminal work of 
Kendall |Ken84j, one considers that the shape of Y is "what remains when translations, rotations and 
scaling are filtered out". More precisely, two configurations Yi, Y2 £ M fcx2 are said to have the same 
shape is there exists a vector (o, a, b) EMx [— ir, 7r[xM 2 such that 



Y 2 = e a Yi_R a + t k (g> b, with R c 



cos (a) 
sin (a) 



- sin (a) 
cos(a) 



(1.1) 



where denotes the vector of M fc with all entries equal to one, ® denotes the tensor product and 
is a row vector. 

In shape analysis, an important issue is the computation of a sample mean shape from a set of J 
random planar configurations Yi, . . . , Yj, and the study of its consistency as the number of samples 
J goes to infinity. A statistical model of shapes must include some nuisance parameters associated to 
the ambiguity of location, rotation and scaling. In [Goo91J, consistent estimation of a mean shape is 
therefore considered in the following deformable model: 

Y j = e a i (/ + QR^ + l fc ® b*, with R a * = , and j = 1, . . . , J, (1.2) 

where the mean pattern / S R fcx2 is an unknown configuration of k landmarks which is also called 
a population mean in |Goo91] or a perturbation mean in |Hucll| . The error terms £j 6 M fcx2 , j = 
1, . . . , J are independent copies of a random perturbation £ in R fcx2 with zero expectation. For j = 
1, . . . , J, the scaling, rotation and translation parameters (a^, a^, b*j) 6lx [— 7r, 7r[xIR 2 are independent 
and identically distributed (i.i.d) random variables independent of the random perturbations 

According to Goodall |Goo91| a sample mean pattern / computed from Yi , . . . , Y j is said to be 
consistent if, as J — > oo, it has asymptotically the same shape than the mean pattern /. Since Goodall's 
proposal, the deformable model (|1.2|) has been highly popular in the statistical shape community, and 
the study of consistent procedures to estimate the shape of the mean pattern / using this model 
has been considered by various authors [KM97, ILe98| ILel93| IHucllj . In this setting, sample mean 
patterns obtained by a Procrustes procedure have received a special attention. In particular, it is 
shown in [KM97J ILe98j that, in the very specific case of isotropic perturbations the so-called full 
and partial Procrustes sample means are consistent estimators of the shape of /. Nevertheless, these 
estimators can be inconsistent for non-isotropic perturbations. Therefore, it is generally the belief that 
consistent statistical inference based on Procrustes analysis is restricted to very limited assumptions 
on the distribution of the data, see also |DM98| IKBCL99| IHucllj for further discussions. 

The aim of this paper is to show that a Procrustes sample mean can be considered as a consistent 
procedure even in the case of non-isotropic perturbations. To this purpose, we propose to exhibit 
the relation that exists between the dimensionality k of the data and the consistency of smoothed 
Procrustes sample means in the perturbation model (|1.2p . Our main result (see Theorem 11.11 below) 
is that when the dimensionality k is high and the mean pattern / is the discretization of a sufficiently 
smooth plane curve, it is possible to build a consistent estimator of the shape of / in model (|1.2p 
under general assumptions on the perturbation £ . The problem of analyzing high-dimensional 2D 
configurations (i.e. when the number of landmarks k is high) arises in the statistical study of a set of 
random points in R 2 that have been sampled from planar curves. A typical example is the analysis of 
contours extracted from digital images. 



1.2 Main contributions 

We consider that the unknown mean pattern / 6 M fcx2 has been obtained by sampling a planar curve 
/ : [0, 1] — > M? on an equi-spaced design, meaning that 

/ = (/(D)Lr 

Under appropriate smoothness assumptions on /, we use a two steps procedure to estimate /. First, 
we perform a dimension reduction step by projecting the data into a low-dimensional space of M. kx2 to 
eliminate the influence of the random perturbations £„•. Then, in a second step, we apply Procrustes 
analysis in this low-dimensional space to obtain a consistent estimator of /. 

To give a more precise definition of our estimating procedure, introduce the following k x k matrix 

A x =(l YI e^^Y . (1.3) 
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The matrix A A is the smoothing matrix corresponding to a discrete Fourier low pass filter with 
frequency cutoff A G N. It is a projection matrix in a sub-space V A of R k of dimension 2 A + 1. 
Then, we project the data on V A x V A C M fcx2 , and we estimate the scaling, rotation and trans- 
lation parameters in model (jl.2p using M-estimation as follows: denote the scaling parameters by 
a = (ai, . . . , aj) G M. J , the rotation parameters by a = (aj, . . . , otj) G R J and the translation param- 
eters by b = (pi, ... , bj) G M. 2J , and introduce the functional, 



J J 
M A (a, a ,b) = —J2 e-^A x (Yj - l k ® ^)i?_ a . - - ^ e~V A A (Y,-, - l fc ® fy^-a,, 



i=i i'=i 
where ||-|| K fc X 2 is the standard Euclidean norm in M fcx2 . An M-estimator of 



, (1-4) 



is given by 



(a*, a*, b*) = [al,...,a*j,al,...,a*j,^,...,b*j) £l J x [-ir,ir[ 



(a x ,a x ,b x ) G argmin M A (a, a, 6), 

(a,a,6)e0 o 



J xl 2J 



J1.5) 



where (a x ,a x ,b X) 



tj_ , . . . ,UJ 



^ A ,a A ,...,a},6 A ,...,6}) G M J x [-vr, vr[ J xM 2J and 



, J J J N 

O = i (a, a, 6) G [-A,A] J x [-A«4] J x ^ 2J = 2 a -? = 0, = and ^bj = ol, 

^ 7 = 1 7 = 1 7=1 ^ 



;i.6) 



with A, .4 > being parameters whose values will be discussed below. Finally, the mean pattern / is 
estimated by the following smoothed Procrustes mean 



3=1 



•i ( A Yj — 1^ <S> 6 A I R 



(1.7) 



To analyze the convergence of the estimator f x to the mean pattern /, let us introduce some 
regularity conditions on the planar curve / and on the covariance structure of the random variable £ 



m 



pfcx2 



Let L > 0, s > and define the Sobolev ball of radius L and degree s as 



H S {L) = {/ = (/« /( 2 >) G L 2 ([0, 1],M 2 ), ^(1 + M 2 )'(M/M)| 2 + |c m (/( 2 ))| 2 ) < L )} (1.8) 



mat 



where c m (/) = (c m (/«), c m (/( 2 ))) = /W^e"* 2 ™** f^{t)e' iM dt) G C 2 is the m-th 
Fourier coefficient of / = (/ (1) , / (2) ) G L 2 ([0, 1],M 2 ), for m G Z. 

Assumption 1. TTie function f belongs to H S (L) for some L > and s > f . Moreover, the k x 2 
matrix f = (/(^))^_ 1 is of rank two, i.e there is at least two different landmarks in the k-ads composing 



Let C = (C (1) ,C (2) ) = (Ci W ,---,Cr>Cr>--->Cr) G M 2fc be the vectorized version of C = (C/' , 
CP)I =1 G R kx2 . 

Assumption 2. T/ie random variable £ is a centered Gaussian vector in M. 2k with covariance matrix 
Tmax(fe) be the largest eigenvalue o/5]. Then, 



a(2) 



.(2) 



lim j mSLX (k)k = 0, 
where s is the smoothness parameter defined in Assumption^ 
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Note that neither isotropy nor invariance conditions are required on the covariance structure of £. 
The following theorem is the main result of the paper. 

Theorem 1.1. Consider model (|1.2p and suppose that the random variables (a* ,a* ,b*) are bounded 

and belong to [—4, x [— 4, =j] J x [— -B, -B] 2J /or some < A,B and < A < \. If Assumptions^ 

i 

and\^hold, and if X(k) = [k 2s + 1 \, then for any J > 2 there exists (ao,ao>^o) G M x [— 7r,7r[xlr such 
that 

r\\f X ~ f&o\\lkx2 > 0, in probability, (1.9) 

K fc— s>oo 

where /© = e a "fR aQ +l^(8)6o- Suppose, in addition, that the random variables (a* ,a* ,b*) have zero 
expectation in [— ^, x [— =4, =|] J x [-B,B] 2J with A, A < 0.1. TTien, we /icrae 

tII/ A - f\\lkx2 > 0, in probability. (1-10) 

fc fc,J-s>oo 

Statement (|1.9p means that, under mild assumptions on the covariance structure of the error 
terms £ it is possible to consistently estimate the shape of the mean pattern / when the number of 
observations J is fixed and the number k of landmarks increases. Note that (ao,ao,bo) depends on 
J and is given by formula (|3,3p in Section [3j To obtain statement (jl.lOp , we assume the condition 
A, A < 0.1 which means that the random scaling and rotations in model (|1.2p are not too large. 
Also, it is assumed that random scaling, rotations and translations have zero expectation, meaning 
that the deformations parameters in model fjl .2 j) are centered around the identity. Then, under such 
assumptions, statement (jl.lOp shows that one can consistently estimate the true mean pattern / when 
both the sample size J and the number of landmarks k go to infinity These results are consistent 
with those obtained in [BC11| . where we have studied the consistency of Frechet means in deformable 
models for signal and image processing. 

1.3 Organization of the paper 

In Section [21 we recall some properties of the similarity group of the plane, and we describe its action 
on the mean pattern /. Then, we discuss General Procrustes Analysis (GPA) and we compare it to our 
approach. In Section [3] we discuss some identifiability issues in model (|l,2p . The estimating procedure 
is described in detail in Section UJ Consistency results are given in Section [5l Some experiments in 
Section [H] illustrate the numerical performances of our approach. All the proofs are gathered in a 
technical appendix. 

2 Group structure and Generalized Procrustes Analysis 
2.1 The similarity group 

First let us introduce some notations and definitions that will be useful throughout the paper. The 
similarity group of the plane is the group (Q, .) generated by isotropic scaling, rotations and translations. 
The identity element in Q is denoted e and the inverse of g G Q is denoted by <? _1 . We parametrize 
the group Q by a scaling parameter a £ R, an angle a G [— ir, n[ and a translation b £ M?, and 
we make no difference between g G Q and its parametrization (a,a,b) G R x [— 7r,7r[xR 2 . For all 
9i = ( a i, oi, b\), 52 = (o2 ) Q ; 2)^2) G R x [— 7r,7r[xR 2 we have 

91-92 = (ai,a 1 ,bi).(a2,a 2 ,b 2 ) = (a x + a 2 ,ai + a 2 ,e ai b 2 R ai +h), 

g^ 1 = (ai, ai, bi)~ l = (-a 1 , -a 1} -e' ai biR^ ai ), (2.1) 
e = (0,0,0). 
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The action of Q onto M fcx2 is given by the mapping (g, x) i — > g.x := e a xR a +tk®b, for g = (a, a,b) G Q 
and x G M fex2 . Note that we use the same symbol "." for the composition law of Q and its action on 
R kx2 . Let 

lfcx2 = lt®« 2 

be the two dimensional linear subspace of M. kx2 consisting of degenerated configurations, i.e configu- 
rations composed of k times the same landmarks. The orthogonal subspace l^" x2 is the set of centered 
configurations. We have the orthogonal decomposition M. kx2 = lfc X 2©lfcx2; ana - f° r an y configuration 
x G M fex2 we write x = Xq + x G lfc X 2 © 1&X2- We call Xq the centered configuration of x and 
x = lfc ® (\ Yli=i i \ Y2e=i *he degenerated configuration associated to x, see Figured] for 

an illustration. 

Definition 2.1. Given a configuration a: in M fcx2 , the orbit of x is defined as the set 

g.x := {g.x, g£Q}cR kx2 . 
This set is also called the shape of x. 

Consider now a degenerated configuration x G 1^x2- Its orbit Q.x is the entire subspace 1^x2- Note 
that the linear subspace 1^x2 is stable by the action of Q, and that the action of Q on lfc X 2 is not free, 
meaning that g\.x = g%.x does not imply that g\ = gi. Now, if x G M fcx2 \ lfc X 2 is a non-degenerated 
configuration of k landmarks, its orbit Q.x is a sub-manifold of M kx2 \ lfc X 2 of dimension dim(C/) = 4. 

Definition 2.2. Given a configuration a? G M fcx2 , the stabilizer /(a;) is the closed subgroup of Q which 
leaves x invariant, namely 

I(x) = {g eG : g.x = x}. 

If x G U X 2 is a degenerated configuration, it can be written £E — (x*- 1 -*, x^) and its stabilizer 
is non trivial and is equal to I(x) = {(a, a, (x^ , x^) — e a (x^\ x^)R a ), a G M, a G [— vr,7r[}. If 
a G M fex2 \ 1 fc X 2 is a non-degenerated configuration, its stabilizer I(x) is reduced to the identity {e}. 
The action of Q is said free if the stabilizer of any point is reduced to the identity. Hence, the action 
of Q is free on the set of non-degenerated configurations of fc-ads in 1R 2 . 

Definition 2.3. A section of the orbits of Q is a subset of M. kx2 containing a unique element of each 
orbit. 

Two well-known examples of sections for the similarity group acting on M fcx2 are the so-called 
Bookstein's and Kendall's coordinates (see e.g. |DM98| for a precise definition). 



2.2 Generalized Procrustes analysis and Kendall's shape space 

Let x G M. kx2 \ lfc X 2 be a non-degenerated configuration. Let H = Id^ — ^t^'k be a centering matrix. 
The effect of translation can be eliminated by centering the configuration x using the matrix H (see 
[DM98] for other centering methods), while the effect of isotropic scaling is removed by projecting the 
centered configuration on a unit sphere, which yields to the so-called pre-shape x° of x defined as 

^0 = £ R /cx2_ 



Hx\ 



cX2 



Consider now the pre-shape sphere defined by §2 := { x °i x £ M fcx2 \ ltx2} an d see Figure [T] for an 
illustration. Note that this normalization of the planar configurations amounts to choose a section for 
the action of the group generated by the translation and scaling in the plane. The Kendall's shape 
space is then defined as the quotient of §2 by the group SO (2) of rotations of the plane, namely 

:= S#/SO(2) = {[a ] : x° G S^'} with [x°] = {x°R a ,a£ [-vr,vr[} . 
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Figure 1: Three orbits of the action of the similarity group Q are represented in blue. The space 
of centered configurations l^" x2 is the green plane. The preshape sphere 8 2 i s the red circle. For a 
particular x € K fcx2 , the centered version is Xq and the centered and normalized version is x°. The 
degenerated configuration associated to x is x. 



The space I] 2 can be endowed with a Riemannian structure and we refer to |KBCL 99j for a detail 
discussion on its geometric properties. 

Let us briefly recall the definition of the so-called partial and full Procrustes distances on § 2 . The 
partial Procrustes distance is defined on the pre-shape sphere §> 2 as 



inf | x 

QG [ — 7T,7t[ 







7#° n II 2 

y 1L a 1 1 TfS fc X 2 J 



x,ye Sg. 



Hence, it is the (Euclidean) distance between the orbits [a; ] = SO(2).x° and [y°] = SO(2).y° with 
x°,y° G S 2 . Let now % be the group of transformations of the plane generated by scaling and 
rotations. The action of h G % on the centered configuration x° is defined as h.x° := e a x°R a where 
h = (a, a) € R X [— tt, vr[. The full Procrustes distance is then defined as 

d 2 F (x°,y°) = mf ||aj° -fc.y°l&«, x,y G S§. 



The full Procrustes sample mean Yp of Yi, . . . , Yj (see e.g. |Goo91l IDM98] ) is defined by Yi? 
argmin^ogsfe ]C/=i ^fO^'^ )- ^ can equivalently be defined by, 



1 J 

J 3=1 



•Y 



(2.2) 



where hi, . . . ,hj are given by the following Procrustes procedure 



, /ij) ( argmin.;.. I , ^,Y° - ± Vj, , ty.Yj 



subject to 



(2.3) 



Obviously, the mean shape / in model (|1.2p does not necessarily belong to § 2 , and will generally not 
have the same orientation than ~Yf- Therefore, using the Euclidean distance in IR fcx2 to compare Yp 
and / is not meaningful. Moreover, the matching criterion (|2.3p is clearly invariant under the action 
of rotations, meaning that Y^i? Q is a minimizer of (|2.3p for any a £ [— it, 7f[. 

To study the consistency of the sample mean Y_p, it is classical in the literature on shape analysis 
to use the Kendall's shape space S 2 . It has then been proved in [K M97| ILe98| that, under mild 
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assumptions on /, [Yp] converges almost surely to [/°] as J — > +00 when the covariance matrix of 
the random perturbation £ in (jl,2p is isotropic (see Proposition 1 in |Le98] for a precise definition of 
isotropy for random variables belonging to R fcx2 ). When the random perturbation are non-isotropic, 
it has been shown in |KM97} ILe98| that [Yp] does not converge to [/°] for some specific configurations 
of mean shape. Note that in |KM97| ILe98| . the authors also studied the convergence of the so-called 
partial Procrustes sample mean defined as 

1 J 1 3 

Yp 6 argmin — dp(x°, Yj) = argmin min — \^ ll^j-^ 1 

x°&* J j=l ' x°£§% (a 1 ,...,aj)&[-TT,n] J J j=l 

and they arrived at the same conclusion on the consistency of Yp. Hence, it is commonly the belief that 
Procrustes sample means can be inconsistent when considering convergence in 5^ an d the asymptotic 
setting J — > +00. 

Therefore, our approach and GPA share some similarities. They are both based on the estimation 
of scaling, rotation and translation parameters by a Procrustean procedure which leads to the M- 
estimators (|1.5p and ([2.3]) . To compute a sample mean shape, this M-estimation step is then followed by 
a standard empirical mean in M kx2 of the aligned data using these estimated deformation parameters, 
see equations fjl -Tj) and (12. 2p . 

However, one of the main differences between the approach developed in this paper and GPA 
is the choice of the normalization of the data. In GPA, the deformation parameters h\,...,hj are 
computed so that the full Procrustes sample mean Yp belongs to the pre-shape sphere §2, see the 
constraint appearing in (|2.3p . Therefore, the computation of hi,...,hj is somewhat independent of 
any assumption on the true parameters (a* ,a* ,&*) in model (|1.2p . In this paper, to ensure the well- 
posedness of the problem (|1.5p . we chose to compute the estimator (a A ,o; A ,6 A ) by minimizing the 
matching criterion (|1 .4[) on the constrained set ©o- The choice of the constraints in ©o is motivated 
by the hypothesis that the true deformation paremeters (a* ,a* ,b*) in f j 1 . 2 1) have zero expectation. 
Another main difference in our approach is the smoothing of the data before applying a Procrustean 
procedure. 



x 



3 Identifiability conditions 



Recall that in model (|1.2p . the random deformations acting on the the mean pattern / are parametrized 
by a vector (a*,a*,b*) = (a*, . . . , a}, a*, . . . , a*j,b\, ■■.,&}) in R J 



-7T, 7r[ J Xl 



>2J 



Assumption 3. Let < A, B and < A < tt be three real numbers. The deformation parameters 
(a*j,a*,b*), are i.i.d random variables with zero expectation and and taking their values in 



" A 


A' 


X 


" A 


A' 


~2"' 


2" 


2 


2 



-B,Bf 



Let 0* = [-4,4] J x [~T't] J x[-B,B] 2J . Under Assumption El we have (a*,a*,b*) € 0*. Note 
that searching for estimators of the scaling parameters in the compact interval [ — 4' 4] ^ ^ s an essen tial 
condition to ensure the consistency of our procedure. Indeed, the estimation of the deformation 
parameters (a* ,a* ,b*) and the mean pattern / is based on the minimization of the criterion (|1.4p 
over the set ©o defined in (|l,6p . If there were no restriction on the amplitude of the scaling parameter, 
the degenerate solution a,- = —00 for all j = 1, . . . , J is always a minimizer of (|1.4p . Therefore, the 
minimization has to be performed under additional compact constraints. 
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3.1 The deterministic criterion D 



Let (a, a, b) G 



D 



7T,7r[ ,J XK°" and consider the following criterion, 

fo ) = tfE W-z-tE^)./ 



i'=i 



(3.1) 



where g-,- = (aj, ay, bj) and g* = (a*, a*, b*j) for all j = 1, . . . , J. The criterion D is a version without 
noise of the criterion M A defined at (|1.4|) . The estimation procedure described in Section [T] is based 
on the convergence of the argmins of M A toward the argmin of D when k goes to infinity. As a 
consequence, choosing identifiability conditions amounts to fix a subset ©o of M. J x [— 7T, 7r[^xR 2 ^ on 
which Z) has a unique argmin. In the rest of this section, we determine the zeros of D, and then we 
fix a convenient constraint set ©o that contains a unique point at which D vanishes. 

The criterion D clearly vanishes at (a*, a.*, b*) G M. J x [—it, tt[ j x~R 2J . This minimum is not unique 
since easy algebra implies that 

D(a,a,b) = o ^ (g; 1 .g*).f = (gj, 1 -g* f ).f, for all j,f = 1,...,J. 

Suppose now that f ^ l2xfc is a non-degenerated planar configuration. In Section 12. 1[ we have seen 
that the action of Q on / is free, that is, the stabilizer /(/) is reduced to the identity. Thus, we obtain, 



D(a,a,b) = 



-i „* 



-l „* 



9j -9j = g y -9f 



for all j,f = 1, . . . , J 



a] - a 3 



a-/ — cij>, 
-- a*, - ay, 

bj)R- aj =e- a i'{b* l -b r )R- 



for aHjj' = l,...,J 



We have proved the folowing result, 



Lemma 3.1. Let f G R kx2 be a, non- degenerated configuration, of k -ads in the plane, i.e. f l2xfc' 
Then, D(a, a, b) = if and only if (a, a, b) belongs to the set 

{a*,a*,b*)*g = {(a*,a*,b*)*(a ,a ,b ), (a , a ,b ) G R x [-^,vr[xM 2 }, 



where (a*, a*, b*)*(a , a , b ) = (a*+a , . . . , a}+a , a\+a Q , . . . , a}+a , e a *b R a * +b*, . . . ,e a J& ^a} + 



b*j) £l J x [-7r,vr[ J xlR 2J . 



Remark 1. Lemma [3. II is simpler than it appears. By reordering the entries of the vector (a*, a*, b*) 
there is an obvious correspondence between (a*, a*, b*) G 0* and (g*, . . . , g*j) G wia the parametri- 
zation of the similarity group defined in Section 12.11 Hence, Lemma 13.11 tells us that the criterion D 
vanishes for all the vectors (a, a, b) G M J x [— vr, 7r[^xR 2 ^ corresponding to the subset of the group Q J 
given by 

(gl, . . . ,g*j) * G = {(gl.g , . . . ,g*j.g ), g G 0} C Q J . 

The "*" notation is nothing else than the right composition by a same go G Q of all the entries of a 
(gi, . . . , gj) G Q J . Hence the subset (g*, . . . ,g*j)*Q can be interpreted as the orbit of (<?*, . . . , g*j) G Q J 
under the (right) action of Q. Indeed, Q acts naturally by (right) composition on the all the coordinates 
of an element of Q J . 
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J\0o 


©i ©*; 

(a*,a*,b*) | 


Gi 3 a @i 3 ) ; 
i / 


y<A a 6>0' a G>0> 



Figure 2: Choice of identifiability conditions when J = 2. 
3.2 The constraint set © 

By Lemma [3. 11 the set ©o must intersect at a unique point, say (a© , Q © i b© )i eac h set (a*, a*, 6*) * 
Q. It is convenient to choose ©o to be of the form @o = W J x [—it, tt[ j xM. 2J n £o where £o is a linear 
space of M 4J . The linear space Co must be chosen so that for any (a* ,a* ,b*) in 0*, there exists a 
unique point (oq , ck© , b@ ) in ©o that can be written as (a^ , a* 0{) , ) = (a*, a*, 6*) * (ao, ao, &o) 
for some (ao,ao,&o) [— 7r,7r[xM 2 . 

Remark 2. As we have seen in Remark [TJ the set (a*, a* ,b*)*Q can be interpreted as an orbit of the 
action of Q on Q J . In this terminology, the set ©o can be viewed as a section of the orbits. Indeed, 
the section is the set of representatives (a@ , Q © i b@ () ) °f each orbit. See Figure [2] for an illustration. 

Let us consider a choice of ©o motivated by the fact that, under Assumption [3j the random 
deformation parameters have zero expectation. In this setting, it is natural to impose that the estimated 
deformation parameters sum up to zero by choosing Co = l^j, which is the orthogonal of the linear 
space I4J = I4J.M C M 4J . Such a choice leads to the set ©0 defined equation (j 1 . 6 [) i.e. 

©o = {(a, a, b) G 6 J , (ai + . . . + aj, a% + . . . + aj, h + . . . + bj) = 0}. 

Now, let us show that for any (a* , a* ,b*) G 0* there exists a unique (a@ o , «e , b@ Q ) = (a*, a*, 6*) * 
(ao, ao, bo) G ©o for some (ao, ao, bo) G M x [— 7r, 7r[xR 2 . This amounts to solve the following equations 

a^ + a + . . . + a*j + a =0, 

a\ + ao + . . . + a} + a =0, (3.2) 
e a ib R at +bl + ... + e a Jb R a:j +b*j =0. 

After some computations, we obtain that equations (|3.2p are satisfied if and only if 

(a ,a ,M = (-a^-a^-tfi^R^)- 1 ), (3.3) 

where a* = ± £/=i a* £ 1, a* = j £/=i a* 6 R, 6* = j £/ =1 ^ G ^ 2 and = j £/ =1 

is a 2 x 2 invertible matrix. Therefore, ( a o , a o ,^e o ) is uniquely given by 

([oe ]i. [^Jj. [be ]i) = («i " " «*> fe i " ^(b^i^T)- 1 )^), 
for j = 1,...,J. 

Remark 3. Another possible approach is to fix, say the first observation as a reference, meaning that 
the criterion D could be optimized on the following subspace of R J x [—7T,ir{ J xM. 2J 

©! = {(a,a,b) G [-A,A] J x [-A,A] 2J x R 2 , (a^a^h) = 0}. 
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With such a choice, for any (a*, a*, b*) G 6 , the j-th coordinate of (a© ,«© ,*>© ) = (a*,a*,b*) * 
(a ,a ,6 ) is given by 

([a* &1 ]j, [a* &1 } j: [b* &1 ]j) = (a* - a\,a* - a*, b* - e^" ai 6^i? a ._ a »), 

where (ao,cto,bo) = (a|, a^, = (— a*, — a*, — e _a *6i-R_a,*). A graphical illustration of the choice 

of identifiability conditions for J = 2 is given in Figure [2j 



4 The estimating procedure 
4.1 A dimension reduction step 

We use Fourier filtering to project the data into a low-dimensional space as follows. Assume for 
convenience that k is odd. For x = (x^, . . . , x' k )' G IR fcx2 and m = — ^p, • • • , 2^, let 

c m (x) = ^Tx e e- l2 ™ l k = ^xfe-^i^xfe-^l] G C 2 with = (x«,xf), 
£=i \£=i £=1 / 

be the m-th (discrete) Fourier coefficient of x. Let A G {1, . . . , be a smoothing parameter, and 
define for each Y,- the smoothed shapes 



fj = [i E c m (Y,)e^ =A A Y je i M 

0<|m|<A J £=1 

In Section [2.1| we have shown that the similarity group is not free on the subset lfc X 2 of degenerated 
configurations composed of k identical landmarks, see Section 12.11 That is why we are going to treat 
separately the subspace 1^x2 an d lfc X 2 °y considering the matrices 

A = hkl'k and A x =(\ e 2 ^)" . (4.1) 

K V 0<|m|<A y ^' =1 

Remark that A is a projection matrix on the one dimensional sub-space V := Ife.R = {clfc : c G M} of 
R fc . The matrix t4 a is a projection matrix in a (trigonometric) sub-space Vq of dimension 2A. Note 
that it is included in the linear space V ± = {x G R k : x't k = 0}. Hence, V$ x is a linear subspace 
of l^" x2 which is the space of the centered configurations. We have, 

A A = Aq + A and V A = V A © V. (4.2) 

Thus, we can write the smoothed shape / • as 

f] = A x Yj = AqYj + AYj G (V x V) © (V A x V A ) 

where V x V = lfc X 2 an d x C 1^" x2 - I n other words, AqYj is the smoothed centered configuration 
associated to Yj and AYj is the degenerated configuration given by the Euclidean mean of the k 
landmarks composing Yj. Finally, remark that the low pass filter and the action of similarity group 
commute, that is, we have for all g G Q and / G R fcx2 

g.(A x f) = e a A x fR a + l k ®b = A x (e a fR a + t k ®b) = A x (g.f). 
4.2 Estimation of the deformation parameters 

Recall that the estimator (a A ,o: A ,b A ) of (a* , a* ,b*) is defined by the optimization problem ([1.5p . 
Nevertheless, thanks to the discussion of Sections 12.11 and 14 . 1 1 one can carry out the estimation process 
in two steps. First, we estimate the rotation and scaling parameters on the space Vq x Vq C lfc X 2 
of the centered configurations. We then use these estimators to estimate the translation parameters 
which act on V x V = lfc X 2- Note that this procedure is equivalent to the optimization problem (|1.5p 
as shown by Lemma |4. II below. 
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Estimation of rotations and scaling. Define 



J J 



i'=i 



Let lj = {(01, . . . ,aj) G M , ai + . . . , a j = 0} and consider the space 0"' = ([-A, A\ J n tj) x 
([—^4,^4]^ n lj). Then, estimators of the rotation and scaling parameters are given by 



(d A ,a A )E argmin M$(a,a). 



(4.3) 



Estimation of translations. Now that we have computed estimators of the rotation and scaling 
parameters, let us define the criterion, 



J J 
M(a, a ,b) = —J2 e~ a M(Yj - l k ® bj)R- a . - - ^ e-V^Y,, - l fc ® 
i=i i'=i 



R- 



and the space 0g = {(b? 3 , . . . , b^ 3 , bf } , . . . , bf) G M 2J , b[ 1} + . . . + b^ 1} = bf 3 + • • • + bf = 0}. The 
estimator of the translation parameters is then given by, 



b A = argmin M(a x ,a x , b). 



beeg 



(4.4) 



We emphasis that the estimators of the translation parameters depend on the estimated rotation and 
scaling parameters. It is shown in the proof of Lemma 14.11 below that we have an explicit expression 
of b A given by 

(4-5) 



(Yi - e ai Y{e a R a )-'R ai , . . . , Yj - e aj Y{e a R a )-"R aj 



where e a R a = ± e a * R aj G M 2x2 , Yj = ± J2e=i[ Y j]i G K 2 , that is AYj = t k <g> Yj G M fex2 is a 



1 x^fc 



degenerated configuration, and Y = j Y2j=i X? ^ 

This two steps procedure is equivalent to the optimization problem (|1.5p as we have the following 
decompositions M x (a, a,b) = Mq (a, a) + M(a, a, b) and ©o = 0°'° x @q, implying following result 
(see the Appendix for a detailed proof) 

(a x ,a x ) G argmin Mg(a, a) 

-A -A 



Lemma 4.1. (a A ,ai A ,b )g argmin M x (a,a,b) 



(a,a,b)e® Q 



argminMia , a , b). 



5 Consistency results 

In what follows, C,Cq,C\ denote positive constants whose value may change from line to line. The 
notation C(-) specifies the dependency of C on some quantities. 



5.1 Consistent estimation of the deformation parameters 

Rotation and scaling. Recall that the rotation and scaling parameters are estimated separately on 
the smoothed and centered observations. We have the following result, 

Theorem 5.1. Consider model (JL2]) and suppose that Assumptions^ and\^hold and that Assumption 

\3\is verified with max{^4,_4} < 0.1. Consider (a x ,a x ) the estimators defined in equation (|4.3|) . If 
i 

A = k 2s+1 then for all x > we have 

P( ~||(a* a A )-(^a*)||* aJ ^ <4e~ x , (5.1) 
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where Ci(L,s,A,A,fo),C 2 (A,A) > are positive constants independent of k and J, Ai(k,J,x) = 
F(k-^)+F(V 1 (k,J,x)) with V x {k, J, x) = 3 7max (fc)AT^(l + v /2^ + 2-^) andA 2 (J,x) = (p + 
/ — \ 2 

) , where F(u) = u + \/^> / or a ^ u > 0. 

Remark that a direct consequence of Theorem 15.11 is the consistency of (a x ,a x ) to (a*, a*) when 
k, J — > oo. Indeed, we have lim u _>o F(u) = and under Assumption [2] and for any fixed J > 1 and 
x > 0, the term Vi(A;, J, x) tends to zero as k goes to infinity. Hence for any x > and J > 1, we have 
lim^oo Ai(k, J,x) = 0. Now, for all x > 0, we have limj^oo A 2 (J,x) = 0. Thus a double asymptotic 
min{k,J} —7- oo ensures that j||(a A ,o: A ) — (a*, ct*)|| 2 t2 j tends to in probability. 

Note that also that Proposition I A. II (see the Appendix) then ensures the convergence of (a x ,a x ) 
to ( a ® , a s ) as J remains fixed and k —> oo. It should be also mentioned that the condition 
max{^4, A} < 0.1 is not needed to only ensure the convergence in probability of (d A , a x ) to ( a ® , a e> ) 
is the setting J fixed and k — > oo. However, this extra condition on A and A is needed to derive a rate 
of convergence (depending explicitly on k and J) . 



Translation parameters. We have the following result, 

Theorem 5.2. Consider the hypothesis and notations of Theorem 15.11 and the estimator b x given by 
formula (|4.4p . Then we have for all x > 0, 

F(j\\b x -b*\\l 2J > C 3 (L,s,A,A,B,f)A 3 (k,J,x)+C 4 (A,A,B)A 2 (J,x)^ <9e~ x , (5.2) 

where C 3 (L, s, A, A, B, /), C^(A, A,B) > are positive constants independent ofk and J, A 3 (k, J, x) = 
F(k-^Ti) + F(Vi(k, J, x)) + V 2 (k, J, x) with V 2 (k, J, x) = (i + ^ + *) . 

Similar comments to those made after Theorem 15.11 are still valid here. For any J £ N and x > 0, 
we have lim^oo A 3 (k, J, x) = 0, since V 2 (k, J, x) tends to as A; goes to infinity by Assumption [21 In 
the double asymptotic setting k,J — > oo, Theorem 15.21 ensure the consistency of b x to the true value 
b* of the translation parameters. Remark also that Proposition IA.3I (see the Appendix) ensures the 
convergence in probability of b x to 6@ with only an asymptotic in k (with fixed J). 

5.2 Consistent estimation of the mean shape 

Theorem 5.3. Consider model (j 1 . 2 [) and suppose that Assumptions^ and\^hold and that Assumption 

[5| is verified with max{^4,„4} < 0.1. Consider the estimator / A defined in (|l,7p and let A = k 2s + 1 . 
Then, we have for all x > 0, 

v (lff X ~ / IIr*x2 > C(L, s, A, A, B, f) (A! (k, J, x) + A 3 (k, J, x) + A 2 (J, x)) ) < Ue~ x , 

where C(L, s, A, A, B, f) > is a constant independent of k and J, A\{k,J,x) = F[k 2s + 1 j + 
F(Vi(k,J,x)) with V x (k,J,x) = 3^ max (k)k~^ (1 + ^2^ + 2^), A 3 (k,J,x) = V 2 (k,J,x) + 
F(k-^) + F(V!(k,J,x)) withV 2 {k,J,x)=A^^{l + ^+f ) andA 2 (J,x) = (^f + fjj '. 

The terms A\{k, J, x), A 3 (k,J,x) and A 2 (J,x) that appear in the statement of Theorem 15.31 are 
the same to those appearing in Theorems 15.11 and 15.21 As a consequence, we have ^\\j — f\\"^kx2 ~^ 
in probability when min{£;, J} — > oo, see the comments after Theorems 15.11 and 15.21 Let 

f&o =e d *(f + l k ®b*(e a *R a *y l )Ra* (5.3) 
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Figure 3: (a) Plot of the mean pattern / = (/W,/( 2 )) used in the simulations with k = 1024. [(b)] 
The first coordinates f^. (c) The second coordinates f^ 2 \ 



where a* = ± £i=i a* el, a* = j £/=i a* 6 K, 6* = j £/ =1 € M 2 and e a * R a * = ± £/ =1 
is an invertible 2x2 matrix, see also formula (|3.3p . A slight modification of the proof of Theorem 
gives the following inequality, 

IpQ||/ A -/©oI| 2 > C(L,s,A,A,B,f)(A 1 (k,J,x)+A 3 (k,J,x))} < Ue~ x . 

Note that Theorem 15.31 gives rate of convergences of f x to / thanks to a concentration inequality. 
As for the estimation of the deformation parameters, the condition max{^4,^4} < 0.1 is not needed to 
only ensure the convergence in probability of f x to /© is the setting J fixed and k — > oo. However, 
this extra condition on A and A is needed to derive a rate of convergence (depending explicitly on k 
and J) of / A to f &a . 

6 Numerical experiments 
6.1 Description of the data 

We make here some numerical simulations to show the effect of the dimension k and the number 
J of observations on the estimation of the deformation parameters and the mean pattern with data 
generated by model (|1.2|) . Different types of noise are considered. For all t G [0, 1], let 

f(t) = (10 sin 2 (vrt) + cos(lOvr) + 20, 2 sin(6vrt) - 11 sin 2 (7rt) + 12 exp(-25(t - 0.4) 2 ) + 1). 

This curve is plotted in Figure [3l The deformation parameters (a*j,a*j,b*j), j = 1, . . . , J, are i.i.d 
uniform random variables taking their values in G = [—3,3] x [-5,5] x [— 1,1] 2 - The law of the 
deformation parameters is supposed to be unknown a priori and the minimization is performed on the 
constraint set 

O = \ (a, a, 6) G [-1,1] J x x R 2J , = 0, ^a i = 0and ^6,=oL 

Recall our notations: the error term is denoted by £ = (C^jC^) G K fcx2 and the vectorized version 
of £ is denoted by £ G M. 2k . The simulations were run with three different kinds of noise. 

White noise : the random variable £ = (Q >C;j j • • • >Ci iC^Y £ ^ 2fe i s a centered Gaussian vector 
of variance 

Si = 4Id 2k . 

We have 7 max (fe) = 4 and this correspond to an isotropic Gaussian noise as in [KM97, Lc98j, sec 
Figure 0] 
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Stationary noise : the random variable £ is a centered Gaussian vector of variance S2 with 

I 2N 1 k 



=1 



7d, 



exp 



100 



=1 



Hence, E2 is a Toeplitz matrix and it follows from classical matrix theory, see e.g. |HJ90| . that 
7max(fc) is bounded (here 7 max < 80). See Figure [5] 

Correlated noise : the random variable £ is a centered Gaussian vector of variance 

S 3 = /d 2 ®PDiag(g, £=1,...,*;)^ 

where P is an arbitrary matrix in SO(fc). Hence, in this case 7 max (&) = \ and the level of noise 
increase with k, see Figure [6] 



6.2 Description of the procedure 

The estimation procedure follows the guidelines described in Section |H We are testing the effect of the 
number J of observations and the number k of landmarks on the estimation of the parameters of interest 
of model ([Q]> . All the simulations are performed with J = 10, 100, 500 and k = 20, 50, 100, 1000, 3000 
and for each combination of these two factors the simulations are performed with M = 30 repetitions 
of model (fl~2]l . 

Moreover, the estimation are done without and with the pre-smoothing step. In the former case 
we have \(k) = |, that is, there is no reduction of the dimension. In the latter case, the smoothing 
parameter A(/c) is fixed manually to ensure a proper reconstruction of the mean pattern /. Note that 
we need A > 7 to get correct results and we took A20 = A50 = A100 = 7, A1000 = H and A3000 = 25. 

We use a quasi-Newton trust-region based algorithm to solve the optimization problems ([4.3p and 
(|4,4p . The formula for the gradient is given in (|B.lj) . All the computations are performed with Matlab. 

6.3 Results: estimation of the mean pattern 

For each of the 30 repetitions of model ([USD w ith the possible values of k and J, we compute the 
quantities ^ \\j—f@ ||jgfcx2 where / A corresponds to the smoothed Procrustes mean of the observations 
defined in (|1.7|) and, jjr||/ _ /© ||jgfcx2 where / is the (non smoothed) Procrustes mean of the data. 
Recall that /© is defined by formula (|5,3p , 

Boxplots of the results are given in Figures [7] [HJ and [9] for the different kinds of error terms described 
in Section \b. 11 In the figures, the abscissa represents the different values of the number k of landmarks 
and boxplots in red correspond to J = 10 observations, in green to J = 100 observations and in blue 
to J = 500 observations. 

The estimation of the mean pattern with the white noise error term is given by Figure [7J In Figure 
[7al for a fixed k, the non-smoothed version j^\\f — /*||j^ fex2 decreases when J increases. Moreover, 
the values of ^\\f — /*||j^ fex2 remain stable when J remains fixed and k increases. Recall that this 
framework corresponds to the isotropic Gaussian noise described in [KM97J. The simulations seem to 
confirm their conclusions and show that in this framework the dimension k is not preponderant. In 
Figure l7bl the smoothed version — /||jjLx2 decreases when J and k increase. The main difference 

with the non smoothed estimation is the convergence to of — / ||jjL X 2 when J remains fixed and 

k increases. 

In Figure [8[ the results of the estimation of the mean pattern are plotted for the stationary noise 
term. Figure [Ha] shows us a similar behavior of the non-smoothed Procrustes mean but with non- 
decreasing values of — /*||^ fcx2 when k increases and J remains fixed. In Figure l8b"[ the smoothed 
Procrustes mean converges as k goes to infinity and the larger J the faster the convergence is. 

The results of the estimations of the mean pattern with the correlated noise are presented Figure [9j 
The results that appear Figure [9a] are quite different compared to those presented Figures [7a] and [Ha] 
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(a) Observations Yi, Y2, Y3 (b) A realization of £ x (c) Spectrum of Si 

Figure 4: Example of data generated by model (II. 2D with white noise in the case k = 300. 
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(a) Observations Yi, Y2, Y3 (b) A realization of <^ (c) Spectrum of S2 

Figure 5: Example of data generated by model ([1.20 with stationary noise in the case k = 300. 
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(b) A realization of Ci (k = 300) 
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(c) Spectrum of S3 



Figure 6: Example of observations generated by model (j 1 . 2 [) with the correlated noise in the case 
k = 300. 
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The estimation seems to be worst when k increases and J remains fixed. The reason is that the level 
of noise, measured by j max (k) is increasing with k. The smoothing step is efficient and the estimations 
presented Figure [9b] have a similar behavior to those of Figures [7b] and I8bl 



A Proofs 

A.l Proof of Lemma 14.11 

Using the decomposition (|4.2|) . we obtain the following identity 

M x (a, a, b) = M A (a, a) + M(a, a, b). (A.l) 
Note that we have used the fact that the subspaces Vq and V are orthogonal and that At^ (8) bj = 

- 3 



Ifc ® bj, for all bj G M 2 . Let us also introduce the notation Yj = ^ X^£=i[Xj]^ ^ ^ 2 > that ^ s ^X? 



l fc ® Yj G M fcx2 is a degenerated configuration, and Y = jX^iX? ^ ^ 2 - ^or a nxe ^ ( a i Q ) 6 
x [— 7r, vrf* 7 , the functional fo i — >■ M(a, a,b) vanishes if and only if there exists a bo G R 2 such 
that e _a j(^4Yj — 1*. (g) bj)R- aj = <S> &o f° r ai l j = 1, •••,«/• Therefore, for this fixed (a, a), 
there is a unique point & = b(a, a) := (Y x - e ai Y(e a ii ce )~ 1 J R Q , 1 , . . . , Yj - e aj Y(e a i? a ) _1 J R Q , / ) with 
e a R a = j J2j=i eCljR aj e R2X2 and which satisfies, 

b(a, a) = argmin M(a, a, b) 

Thence, thanks to the decomposition (jA.ip and the fact that M(a, a, b(a, a)) = 0, we have 

argmin M x (a,a,b) = I argmin M^a^cx) , by argmin Mg(a, a)) J, 
(o,a,6)e0o V(a,a)e© f 7' a (a,c*)e©£' a 

and the claim is proved. □ 
A. 2 Proof of Theorem 15.11 

For all (d A ,oi A ) G &q ,ol and (a*, a*) G [— A, A] x [— A, A], we have the following inequality 

j\\(a\* x ) - (aW)|| 2 2J < |||(a A ,a A ) - (a* 0o , c* 0o )|| 2 , 7 + ^||(a* 0o ,a 0o ) - (a*, a*)|| 2 2J . 

The proof of Theorem 15.11 is a direct consequence of Proposition I A. II and Lemma IA.2I below which 
control the convergence in probability of the two terms in the right hand side in the preceding inequality. 

Proposition A.l. Consider model (|1.2p and suppose that Assumptions^ and\^hold and that Assump- 

tion\^is verified with A, A < 0.1. If A = X(k) = [k 2 ^ 1 J then there exists a constant C(L, s, A, A, fo) 
such that for all x > 

1 



-j\\(a\a x ) 



(a* &0 ,cx* &0 )\\l 2J >C(L,s,A,A,f )(F(k-^Ti)+F(V 1 (k,J,x))^ <2e 



where Vi(k, J, x) = 3j m£LX (k)k ^TT (l + y/2f^ + 2^) and F : R + — > R, with F(u) = u + y/u. 

The proof of Proposition I A . 1 1 is postponed to Section [A. 51 The following lemma is a direct consequence 
of Bernstein's inequality for bounded random variables, see e.g. Proposition 2.9 in |Mas07] . 

Lemma A. 2. Suppose that Assumption^ holds and that the random variables {a*j,aj), j = 1, . . . , J 
have zero expectation in [—A, A] x [— A, A]. Then, for any x > 0, we have 

F 0||(a* 0o ,a 0o ) - (a*,a*)|| 2 2J > C '(A, A) (fi + ^)') < 4e~*, 

where C(A,A) = 4max{A 2 , A 2 }. □ 
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Figure 7: Estimation of the mean pattern with the white noise. Boxplot in red correspond to J = 10, 
in green to J = 100 and in blue to J = 500. 
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Figure 8: Estimation of the mean pattern with the stationary noise. Boxplot in red correspond to 
J = 10, in green to J = 100 and in blue to J = 500. 
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Figure 9: Estimation of the mean pattern with the correlated noise. Boxplot in red correspond to 
J = 10, in green to J = 100 and in blue to J = 500. 
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A. 3 Proof of Theorem 

The proof of Theorem 15.21 follows the same guideline as the proof of Theorem 15.11 Consider the 
inequality 

iilA l*ii2 . 2 || i" \ n2 ^ II l* l*ii2 

- b IIr2j < -j||o - o 0o || R2J + -j\\o &0 - b || R2 J- 
Theorem 15.21 is now a direct consequence of Proposition IA.3I and Lemma IA.4I 

Proposition A. 3. Under the hypothesis of Proposition [A.li there exists a constant C(L,s,A,A, 

B, f) > such that for all x > 0, 

P 0||b A - b* &0 \g 2J > C{L, s, A, A, B, f) (f(ATSTT) + F(Vi(k, J, x)) + V 2 (k, J, x)) \ < 5e~ x , 

where V x {k,J,x) = ^^{k)k~ ^ (l + ^/2^ + 2^) , V 2 (k,J,x) = | 7 max(fc)(l + + f) ^ 

F(u) = u + 01, u > 0. 

The proof of Proposition IA.3I is postponed to Section IA.6I 

Lemma A. 4. Suppose that Assumption^ holds with A < j and that the random variables (o^,a^,6p 
j = 1, . . . , J have zero expectation in Q C M. 4J . For any x > , we have 

j\\b* &0 ~ b% 2J > C{A, A,B)[^+ij) 2 )< 4e~*, 

where C(A,A,B) = 8B 2 e iA (cosA-sinA)' 2 . 

Proof. This result is a consequence of the Bernstein's inequality for bounded random variable. We 
have 



\bh ~b*\ 



where e a * R a * = ± Y\ J =1 e a Ii? a * is an invertible 2x2 matrix whose smallest eigenvalue is greater 
^ — j 1 j 

than e~ A (cosA — sin .4.) > as A < f . To see this, remark that the eigenvalues of e a * R a * are 
7 S/=i e a ^*(cosa* ±isina*) and we have | -j ^/=i e a ^*(cosa* ±isina*)| > e _j4 (cos.4-sm„4.) > 0. We 
now have 

1 -* 
-7jll fo 6>o ~ b *lliR 2J ^ C(A,A)\\b || K 2, 

where C(A,^l) = e 2j4 (cos.4 - sin^l)" 1 . Finally, for all u > we have F(-±j\\b* &o - b*\\ R2J > u) < 
¥(C(A,A)\\b*\\^2 > u) and a Bernstein type inequality (see e.g. Proposition 2.9 in [Mas07]) gives us 
P(||6*|| R 2 > 2B(^f + ^j)) < Ae~ x which yields 



P(jll&0o " > C(A A,B){fi+ ij) 2 ) < 4e-, 

where C(A,A,B) = 4B 2 e iA (cos A - sin A)~ 2 . □ 
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A. 4 Proof of Theorem 15.31 

Recall the notations introduced Section f2, II for g^ = (a^, ctj, bj) we have gf-f = e°"i fR&* + bj ■ Then, 
we have 



r-t 



' - 1 II 1 

1^X2 fell J 



j^r^Y,-)-/ 



3=1 



3=1 



2 2 

+ 7 

Rfex2 k 



A / — / 



Rfcx2 



3=1 



A A C,. /'•... ., 



{ fcx2 



A / — / 



V B 

The rest of the proof is devoted to control the terms B and V. The term B is controlled by the bias 

of the low pass filter. According to Lemma IB. H if Assumption [TJ holds and by choosing the optimal 

i 

frequency cutoff A = X(k) = [A; 2s + 1 J, there exists a constant C(L,s) > such that 

2, 



B 



k 



|A A /-/| 



< C(L,s)k~^. 



(A.2) 



The term V contains two expressions. To bound the first one we use Bessel's inequality and Lemma 
IB. 41 More precisely, we have for all j = 1, . . . , J 

1 " A\(g})-\g*.f - f) g kx2 < \\\( 9 ^)- l .g*.f - /fc x2 



k 



< C(A, f) || (a* - af, a* - e & i (b* - bj)R_ & 



< C(A,f)\\(a*-aj,a*-a^\\l 2 +C(A,f)\\b 
We can now use Theorem 15.11 and 15.21 to derive the following concentration inequality, 



6 A H 2 



i- 



Ell^cc^r 1 ^-/-/)! 



(A.3) 



> C{L, s, A, A, B, f) [Ax {k, J, x) + A 3 (k, J, x) + A 2 ( J, x) < 13e 



where C(L,s,A,A,B,f) > is a constant independent of k and J and ^1,^2,^3 are defined in the 
statement of Theorem 15.11 and 15.21 The second term contained in V is treated by equation (jA.lOD 
below. Hence, formulas (1A.3|) and (jA.lOp yield 



V>C{L,s,A,A,B,f)^A l {k,J,x) + A 3 {k,J,x) + A 2 (J,x)jj <Ue~ x , (A.4) 
for some constant C(L,s,A,A,B,f) > 0. Putting together equations (|A.2|) and (1A.4|) gives 



> C(L, s, A, A, B, f)[Ai(k, J, x) + A 3 (k, J, x) + A 2 (J, x) ) < 14e 



^,11/ fWm. 

for some constant C(L,s,A,A,B,f) > 0. The proof of Theorem 15.31 is completed. 
A. 5 Proof of Proposition lATTl 



□ 



The mean pattern / can be decomposed as / = / + /o £ lfc X 2 ® lfcx2- Then, /o is the centered 
version of / and we can consider the criterion, 



1 

D (a,a) = jtY1 



e a i- a >f R a 



i-«> J 



1 a* -a 

-^eV a >'f Q R a > : > a ., 
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We now have, 

(a x ,6t X ) = argmin Mq(o, at) and ( a e> i a & ) = argmin D Q (a, at). 

(a,a)e©^' a " " (a,a)e©^ a 

Then, the convergence of (d A ,a: A ) to ( a O ) Q: g) o ) i s guaranteed if ( a ® i a ® ) is uniquely defined and 
if there is a uniform convergence in probability of Mg to Dq, see e.g. [vdV98j. This is the aim of 
Lemmas IA.5I and IA.6I below. 

Lemma A. 5. Let f be a non-degenerated configuration in R fcx2 , i.e. f ^ lfex2- Then, the argmin 
of Dq on 0q' q is unique and denoted by (o>q ,oiq o ) = (aj — a*,aj — at*), where a* = jX^/=i a jf> 

Proof. As / is a non-degenerated configuration, we have /o 7^ 0. Thus, the stabilizer of / is reduced 
to the identity, see Section f2.1l Then Do(a,a) = if and only if there exists (ao,ao) £ R 2 such that 
(a, a) = (a*, a*)*(ao, ao) = (a*+ao, a*+aoi ■ • • j aj+a-o, a}+ao)- By choosing (00, ao) = ( — 0*3 — 
we have E/=i( a j> a j) * ( a o,a ) = 0. That is (oq^OqJ = (a*, a*) * (a ,a ) G ©"'". □ 

We now show the uniform convergence in probability, 

Lemma A. 6. Suppose that As sumptions {J\ [H and [3] hold and let F : R — )• R, H/if/i -F(u) = it + -^n. 
For any x > we have 



sup |M A (a,a) -D„(a,a)| > C{L,s,A,f ) (f(AT^) +F(V(k, J,x))) ) < 2e~ x 

^||/o|| Rfc x2,2} andy(fc,J,x) =3 7max (A:)fc-^T(l + 



w/iere C(L,s, A, / ) = e 2A max j^||/ || R kx,. — T 

Proof. Let us write the following decomposition, 
M A (a,a) 



J 



1 J 



i'=i 

1 J * 

e a ^(A x f - f )R a *- aj - - £ eV-V^ _ f )R a} _ a ., 

j'=i 



J'=l 



3' J' 



1 J 



j'=i 
1 1 * 



i'=i 
j 



i'=i 
j 



(A.5) 
(A.6) 

(A.7) 

(A.8) 

(A.9) 



i'=i 



3' r 



Then, criterion Mq is viewed as a perturbed version of the criterion Do(a, a) = (|A.5j) . 

M A (a, a) = L> (a, a) + B A (a, a) + V A (a, a) 
where the bias term is B A (a, a) = (|A.6|) + (|A.7|) and the variance term is V A (a, a) = (|A.8|) + (|A.9p . 
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The bias term. We have for any (a, a) G ©q'", 



e a ^(^/-/ )i? a *_ Q , 



j'=i 



k 



A$f - f 



A double application of Cauchy-Schwarz inequality implies that, 



i'=i 



J 



j'=l 



2 / 1 



7 ^e^||(^/-/o)%_ Q 



— 2e -||/o|| K fcx2||A / - / || R fcx2. 
Finally, by using Lemma iB.ll we have, 

sup |B$(a,a)| < d(L,s,A/) (V^ 1 + k~^) . 

(a,a)e© a ' a v ' 

where Ci(L, s, A, f) = C(L, s)e 2A max {2^= ||/ || RfeX 2, l}. 

The variance term. First, the term ()A,9P is by the Cauchy-Schwarz inequality controlled by 2e 2A -^ 

|| f p[|gfcx2 \J (|A.8[) ". The term ()A,8P is bounded from above by X]/=i e 2yi ||^4q ||^fe X 2 • To derive an 
upper bound in probability, note that we have the following equality in law, 

J 

Ell^oC/lliRkxz =£'B£, 



with B = [Idj ® S3] [Jd 2 j ® (A$)'A$\ [Idj ® Si] g ]R2jfcx2jfc and ^ = ^ _ _ .^ 2Jk )' i s a centered 
Gaussian vector of variance 

Id 2Jk - We have tr(Jdj®E) < 2Jkj max (k) and tr((^)'A^) = B±^. Using 
a classical concentration inequality for quadratic form of multivariate Gaussian random variables, see 
e.g. [LMOOj Lemma 1, we have for all x > 0, pff'Bf > 2J7e7 max (/c)^±± + 27 max (/c) Vx2Jk + 



4x7 max (fc) 2A ^ 1 ) < e x , which yields together with formula ()A.2j) . 



Jk 

Hence we have 



^E^II^Cill^xa >^ A lm ^k)k~^(l + ^+2^)\ < e -. (A.10) 



sup |Vq (a, a) + Bg(a, a) 

(a,a)e® a ' a 



> C(L, s, A, /„) (k~^ + k~^ + Vx{k, J, x) + y/vJJ^J^j \ < 2e~ x , 
where C(L,s,A,f ) = e 2A max {-|||/ || Rfcx2 , -^||/ || R , x2 ^2 e A , 2} and Vi^J.x) = 3^^#(l + 
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For a fixed J S N, the convergence of the M-estimator (a x ,a x ) to (d0 o ,a@ o ) when k — > oo is 
guaranteed by Lemma IA.5I and IA.6| see e.g. |vdV98| . Nevertheless, we are able to give a rate of 
convergence and non-asymptotic bounds in k and J by using the classical inequality, 

\D (a\a x )-D {a* &0 ,a* &0 )\<2 sup \D (a, a) - M A (a, a)\. 

(«,a)£6r 

This, together with Lemma lA.71 below will prove Proposition I A. II 

Lemma A. 7. Assume that A, A < 0.1. There exists a constant C(A,A) > independent of J such 
that for all (a, a) G & a,a we have 

\D (a,a) - D (a* &0 ,a* &0 )\ > C(A,A,f )j ||(a - a* &0 ,a - Q© )||r 2 j , 



where C(A,A, f ) = C(A,A)l\\f \ 



Proof. By definition, given a (a*, a.*) £ [— A, A]^ x [— ^4, the point (a© , c*© ) is the unique 
minimum of D on 8j' a = [-A,A] J n lj x [-.4,„4] J n lj. Then, for all (a, a) € ©o' a , there exists 
a c = c(a, a) G O ' a such that the Taylor expansion of Dq at (a@ Q , CKg, ) can be written, 

D (a,a)-D (a* &0 , a 0o ) = - (a - a @() , a - a 0o )' [V 2 A) (a© , a 0o )] (a - a 0o , a - a< 0o ) 

+ i[V 3 D (c)](a-a^ )o ,a-a* g)o ). 
Let 5 = m&x{A,A}. This, together with Lemma lB.21 and IB.3I imply that 

D (a,a)- D (a* &Q ,a* &0 ) > ^(a-a* &0 ,at- at* &0 )'[V 2 D (a* &0 ,a.* &0 )](a - a* &0 ,a - a* &0 ) 



o An m n 2 1 11/ * * \ II 

S—e H/ollakxa^ || (a - a 0o ,a - a eo )|| 



^ II £ ll2 1 11/ * * \||2 / -2A 2A\ 

> ||/o||Rfcxa^g||(o-Oe ,a-ae )|| R aj ( e - <5y e ). 



Hence, one can choose 5 > sufficiently small such that (e~ 2A — 5^-e 2A ) is strictly positive for all J 
and k. For example, we have (e _2<5 — 5^Pe 2<5 ) > 0, if 5 < 0.1. Then, using such a <5 it follows that for 
all (a, a) £ 0"'", 

\D (a,a)-D (a* &0 , oc* &0 ) | > C(A, A) - 1| / 1| J fcxa j || (a - a 0Q , a - a 0o ) | |^ 2J . □ 

The proof of Proposition I A . 1 1 is almost done. Remark that Lemma [A . 71 ensures that for all u > we 
have P(i||(a\e* A ) - (a* &0 , a* &o )\g 2J > u) < ^{ c{A 2 AJo) Bup (0|g)ee a,« | M oK «) - A,(a, a)| > u). 
Lemma [A.6I ensures that there is a constant C(L, s, A, A, fo) such that for all x > 0, 

P(~||(&VV(«e > a e )llR" >C(L, S ,A,A/o)(i ;i (^^) + m(^^^))) <2e" :c . 
where Vi(fe, J, x) = J max (k)k~W (l + + 2-^) and F : R — ► R, with = « + y/u. □ 

A. 6 Proof of Proposition IA.3I 

First of all remark that, thanks to formulas (|3.3p and (|4.5p . we have explicit expressions of b@ o = 
argmin bg@ b D(a* ,a* , 5) and of & A = argmin bg 0b M(d A , ck , b). Indeed, we have, 
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where e a * R a * = $ £)/ =1 ^ R ^ G M 2x2 , 5* = ^ ^i=l & j G r2 and 



where e &X R & x = 7 £/ =1 e a ^A G M 2x2 , Yj = ^SiU^fc = e°?/i? a * + 6* + e^Cj^a* G K 2 
where / = UXift 6 K 2 , cj = |£ti[Cjk e K 2 and Y = 7 Sj=i Yj = ~f(e~^R^) + b* + 
7 Ej=i '" : C,/>'m; G M 2 . Thence, we have 

, 1 J 

7 l|fae - 6A Hr- < jEl^X* -c^^'iJaO^^)- 1 ^)!!^ (A.ll) 



+ Hb'^Ce-*^.)" 1 ^; "^(e^i?^)- 1 ^)!! 2 , (A.12) 



+ II^C,^IIm2 + ||e^ (e^^X^X*)"^!!!* (A.13) 



where (e a *CR a *) = j 

Ej=i ^ Cj-^a* G M 2 . The rest of the proof is devoted to the control of the terms 
rfA~TT|) . (|A~T2l) and (|A"T3|) . 

In this section, we denotes by || • || op the operator norm of a 2 x 2 matrix, i.e. \\A\\ op = |7max(^4)| 
where 7 max (^4) denotes the largest eigenvalue of a matrix A G K 2x2 . Note by the way that the 
eigenvalues of the matrix j e" J R a . G M 2x2 are 7 Ej=i e(lj ( cos ( Q i) iisin(o;j)) for any J and 

(o 1} ...,oj,ai,...,aj) G [-A,A] J x [-.4,„4] J . It yields \\e~ a iR- a . \\ op < e A and ||(e a J R a )- 1 || op < 
e^cos^) — sin (.A)) -1 which is a positive real number since by hypothesis we have A < ?. We are 
now able to derive an upper bound for fSIT| . 



dsiu < 11/11^ j En^^ -^V^^^rX* 



J 



J 



J 



+ 2e 2A ||/|| 2 2 ||(e^ii.,)-i 2 p ||( e ^ J R^)-(e' iA ^)|| 2 ? , 
It is now easy to show that there exists a constant C(A, A, f) > independent of k and J such that 

1 J 

{A3U < C(AA/) 7 Ell( a i -S A ,a*-a*)||^. (A.14) 

J J=l 

The term ([A~T2]) is very similar to the term (jA.lip and we have 

1 J 

JA31 < C(A, AS)-^||(a* -a*,a* -d})||* a , (A.15) 
i=i 

for some constant C(A,„4, i3) > independent of k and J. By using formula (fA~14|) . ([A~l~5]) and 

Proposition I A. 1 1 together, there exists a constant C(L,s,A,A,B,f) independent of k and J such that 
for all x > 0, 

(jATTD + (IA~T2D > C(L,s,A, A,B, f)(F(k~^) +F(V 1 (k, J,x)))) <4e~ x , (A.16) 
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where Vi(k, J, x) is defined in statement of proposition I A. II and F(u) = u + \fu for u > 0. 

The term ()A,13P can be bounded in probability as follows. First of all, remark that there exists a 
constant C(A,A) > such that 



m^<c(A,A)\j2\\c j \\i: 

J 3=1 



Then, for all j = 1, . . . , J, the random variable ^ = | X^=i(|Q ]-6 [C* k) ^ ^ 2 can written 

( ^ 2) '4(<y ^V[c! 1) ]i,...jc; i) ] fc jcf ) ]i,...jcf ) ],r 



where 0^ is a column vector of k zeros. The random vector [CjiCj)' ls a two dimensional centered 



Gaussian vector of variance V 

2 



1 W 



The 2x2 matrix V is of trace less 

2 



or equal to §7 max (fe)- Hence, the random variable jX^ = i||Cjll|a = jYlj=i(Cj Xj)(Cj Xj)' 
has the same probability distribution as j£'[Idj (g) V]£ where £ is a centered Gaussian vector of 
variance Idi j. A standard concentration inequality of \ 2 distribution (see e.g. [LM00J Lemma 1) 

is P ({'[Idj ® V]£ > J|7max(A;) + ^ mSiX (kW2Jx + xf 7max (A0) < e"* for any x > 0. It yields that 
there exists a constant C(A, A) > such that for all x > 



(lA~13l> > C(A^) 



Tmax 

(k) 

k 




. X X 

<-> 2 J + J 



< e 



(A.17) 



To end the proof remark that for all u > we have P( j||b@ - ^Hj^./ > u) < Pf fjATT]) + (jA~l~2|) + 
()A.13p > it). This together with ()A.16p and (|A.17|) yield that there exists a constant C(L,s,A,A,B, 
f) > independent of k and J such that for all x > we have 



jll fo eo 



where V^fc, J, x) 



b A ||^ 2J > C(L, s, A, A, B, f)(F{k-^) + F(V!(k, J, x)) + V 2 (k, J, s)) ) < 5e~ x , 



(A;) 



1 + y^2~j + j) and the proof of Proposition IA.3I is completed. 



□ 



B Technical Lemma 

Lemma B.l. Assume that Assumption^ holds, i.e. f G H S (L) with s > | (see (jl.8p ) and f = 

(f(i))e=l e P fex2 - If ^{k) = |_/c 2s+1 J i/ien tftere ezisis a constant C(L,s) such that for all f G H S (L) 
we have 



A x f-f\\ 2 Rkx2 <C(L,s)k-—\ 



1 



where A A is t/ie projection matrix defined in (|4.2[) . 

Proof. Recall the notations introduced Sections 11.11 and 14.11 : c m (f) = (c m (f^),c m (f^)) G C 2 is the 
m-th Fourier coefficient of / G L 2 ([0, 1], M 2 ) and c m (f) = [cm (/) > Cm' (/)) 6 C 2 is the m-th discrete 
Fourier coefficient of / G M fex2 . Thus, we have by Parseval's equality 



A / — / 



__ 1 

cx2 fe 



lE^H' 2 = ^E 

|m|>A KfeX2 * H>A 



Cm(/) 



where for all c = (c^ 1 ),^ 2 )) G C 2 , ||c|| 2 2 = \c^\ 2 + | C ( 2 )| 2 . It yields 

i||A A / - /|| 2 fcx2 < 2 \\Wif) ~ Cm(f)\\h + 2 E IM/)Hc»- 



|m|>A 



H>A 
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Firstly, Lemma 1.10 in |Tsy09| ensures that if s > \ then \\c m (f^) - c m (f^)\ < C{L,s)k~ s+ ^ for 

i 

any m G N and i = 1,2. Secondly, equation (1.87) in |Tsy09| ensure that if \{k) = |_& 2s+1 J then 

S|m|>Al Cm (/ ( '*' > )| 2 — C(L, s)/c _2s + 1 . Thence, there is a constant C(L,s) independent of k such that 



If s > | then we have 



i||A A / - f\\l kx2 < C(L, s)(k 2 - 2s + k'W- 



±\\A x f-f\\ 2 Rkx2 <C(L, S )k-^Ti. □ 



We now give a general expression of the gradient and of the Hessian of the criterion D. Recall that 
we have 

i J i i 

D{a, a ,b) = -J2 ^ (e°* fR a * + Ifc ® {b* - bj))R- aj + j ^ e ~ V ( eV f R «*. + lfe ® ^ ~ b J')) R -« i > 

3=1 3''=1 

where (a, a, 6) = (ai, . . . , aj, a\, . . . , aj, fei, . . . , 6j) G R J x [— 7r, vrf'^xIR 2 ^. To shorten the formulas 
below we note gj = (gf> \gf\gf\gf') = {ti r <\,.bj). that is gf> = aj, gf 1 = aj, gf> = bf> and 

gf = bf. Let f g . = g- l .g*.f = e~ a * {e a *i f R a * +\ k ® (b* - bj))R- aj , and for all j 1 = l,...,J and 
P! = l,...,4, 

d g9>l) D(a,a,b) = ^(d g ^ ) f g . i ,f gh -jYl fas' )_,..„■ 



The second order derivatives are 



^ ( P2 )d (Pl) D(a,a,b) = —^rid ( Pl )fg h ,d ( P2 )fg n ) ifjl^Ja, (B.2) 

J 



+ (i-A)(^r ) ^'^)^ 1 ) Rfcx2 - (b.3) 

The expressions of the gradient and the Hessian of D simplify on the set (a*, at*, b*)*Q, see Lemma 
3J] For any g = (a ,a ,b ) G R x [-7r,vr[xM 2 , we have f g * go - j E/'=i %,-so = e ~ a °(f ~ 1 k <8> 

b )R- ao - -j E/'=i e" ao (/ - Ifc <8> b )R„ ao = 0. It yields that for all (a ,a ,&o) G K x [-7r,vr[xM 2 we 
have VD((a*,a*,b*) * (a , «o, &o)) = 0, and 



djp 2 )d(p 1 )D((a*,at*,b*) * (a ,a ,b )) = < 



-jk\ d a M fat -9o > 5 ( P2 ) / s ;„ .go ) > if h + h , 

\ 9 J1 J1 9 32 J2 I Rfcx2 

,7k ~ 7%) ( ® g ( p ^f 9* h -go i d g (p2)f g* h -go 



(B.4) 

Lemma B.2. T/ie smallest eigenvalue of V 2 Do( a g) 7 Q g) ) restricted to the subset ©o is greater than 
e 2A 7k II/o||r*x2. 

Proo/. In this proof, h, = (a, a) G R J x [-vr, vr[ J and f° h . = e a ^- a ^ f R a *_ a .. We have d aSl f hj = ~fh n 
and d aji ffo. = /°. i?_ | for all = 1, . . . , J. By using Formulas (jB.4|) . the Hessian of Dq at the 
point (ag o ,ae ) = (a* — a*, a* — at*) G Qq ,Q! is given by, 



d au d a .D (a* - a*, at* - at*) = d a d a .D (a* - a*, at* - a* 



,7k ~ 7%) I! 6 " / o|lKfex2 
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and the second order cross derivatives are 0. The Hessian of Dq at (a* — a* ,a* — a*) can be written, 



V 2 A)0* - a*, a* - a* 



2 ii/, ii2 fJIdj-tj xJ 



V 





JIdj - lj x j 



(B.5) 



where Idj is the identity in and Ij is the J x J matrix of ones. The eigenvalues of JIdj — lj x j 
are with eigenvector lj and J with eigenspace ijp. It yields that on 0q ,Q! the smallest eigenvalue of 
V 2 L> (a* - a*, a* - a*) is e 2a * jy|/o||jL x2 . To finish the proof, remark that a* < A. □ 

Lemma B.3. Let, 5 = max{j4, A}. For all c = (oi, . . . , aj, ai, . . . , aj) G @q ,Q! and (a, a) G R 2J we 

1 ,2 



|[V 3 A,(c)](a,a)| <40Je 2A ||/o|| 2 fc x 2 -^|Ka,a jiii 
Proof. In this proof, /i = (a, a) G x [— 7T,7r[ J , that is /i^. = a,- is the parameter of scaling and 



hj = aj is the parameter of rotation. We have = e a i aj foR a *- a ■ Then, from equations (|B.2p 



_ o a!-o-j 



and ()B.3p . it follows that for all ji, j2> J3 = 1, . . . , J and Pi,P2,P3 = 1, • • • ,2, 
d h { P3 )d h ( P2 )d h ( Vl) D (a, a.) = 0, if ji ^ j 2 and j* 2 / j 3 and ji / j 3 , 



33 n 3i 



d h (p 3 )d h ( P2 )d h{Pl) D (a,a) 

32 31 31 



d h ( P3 )d ( P2 )d {Pl) D (a,a) 

31 31 31 



<J l\, \ .11 11 



if Ji 7^ J2, 



J/. \ 9 h (P3) d h (P2)d h (Pl) fhi, , ( /ft 



« J 



i'=i 7 



31 31 



By Cauchy-Schwarz inequality we have, 



9 h M d h Mfh h > d h (P3)fh j2 
31 31 32 



< 



^. (p 2 )/L. 
n n 1 / ro'=x2 



pO 



31 31 31 



31 31 



/, , ! Rkx2 \\d h (P3)fhj 3 I 



<e 2A ||/ ||Lx2. (B.6) 



and 



d h (P 3 )d h( p 2) d h(Pl) f hji , (f° hn ] 



31 31 31 



J'=l 7 



< 



d h (p 3 )d ( P2) d iPl) f h 

31 31 31 



/L-tIX< rofex2 * ^||/o||^ X 2 (B.7) 



For k = (ki, . . . , K2,j) G N 2>7 , denote by |k| = k\ + . . . + k 2 j and 

= (da.THda.T 2 ■ ■ ■ (d aj T 2J -'(d aj r^. 

Then, the differential of order 3 of Dq at c G ©q'" applied at (a, a) G M. 2J writes as 

[V 3 D (c)](a, a )= J2( 9 h) K Do(c)h K 

\k\=3 
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where h K = a^ct^ 2 . . . a K f J 1 a J 2J . This formula together with equations (|B.6P and (|B.7[) give, 



[V 3 A(c)] (a, a) = EE ® fc C«i)® fc wfl h ^) A)^)^^^ 



Pl,P2,P3 = lil=l 



31 31 31 



+ 3 E ^s^^AiCc)^^^^ 



32 31 31 



^ 0e 2A 1 || i|2 il(pi)l(P2). (ps)| , 6 |l(pi)l(P2)7 (p 3 ) 

Pl,P2,P3 = l jl=l ji¥=h=i 

E (SEi^'i + 'V '' 



Pl,P2 = l jl=l 
J 2 



ElWl) 

ii=i 7 



< ^ill/oll^j E E i^^ 1 



J / 2 



J 2 



2Me^i||/o[|^x4E ( El^l) * 405^ll/ol&«jE El fc J 



>l)|2 



j=l v pi=l ' j=lpi=l 

And the claim is now proved. 

Lemma B.4. For all f G R kx2 and (a x , a x , &i), (a 2 , a 2 , 62) G x [-.4,-4] x R 2 , Zef 

e ai fR ai + lfe(8)6i, i = 1,2. Then, we have 

1 

^ e r e C7(A/) = 2max{4e 4 ^i||/||2 fcx2 ,l}. 



□ 



■llfi-/ - 52-/||Lx2 < C(^,/)[|(oi,ai,6i) - (a 2 ,a 2 , Mir 



Proof. We have 



~l|<7i-/ - 52./||^x 2 < 2e 2A i||e^ — //?, 



• M-aa ~~ /llmfcxa + 2— 1 1 H fc <8> (61 - 6 2 ) ||^ fcx ^ 



(B.8) 



Let now F(a, a) 



^ k \\e«fR 



We have \d a F(a, a 



\(e a fR a ,e a fR 



f) 



< 



and \d a F(a, a) 



The Euclidean norm of the gradient of F satisfies 



Vk\\e a fR a 

e a fR a+ i , e a fR a - f) R kx2 1 < ^= 



||VF(a,a)|| R2 = y/\d a F(a,a)\ 2 + \d a F(a,a)\ 2 < v^-^ 



Since we have \F(a,a)\ = \F(a,a) -F(0,0)| < V^e A ^ 



s/k 



(a,a)|| R 2, equation (|B.8[) yields 



" 52-/11^x2 < 2max {4e 4A i||/|| 2 fcx2 , l} (h - a 2 | 2 + |ai - a 2 \ 2 + \b? - b^\ 2 + \b? - bf\ 2 ), 



k 



which concludes the proof. 



□ 
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